Optimal Strategy under Unknown Stochastic Environment | Nonparametric Lob-Pass Problem
نویسنده
چکیده
The bandit problem consists of two factors, one being exploration or the collection of information on the environment and the other being the exploitation or taking bene t by choosing the optimal action in the uncertain environment. It is necessary to choose only the optimal actions for the exploitation, while the exploration or collection of information requires to take a variety of (non-optimal) actions as trials. Hence, in order to obtain the maximal cumulative gain, we need to compromise the exploration and exploitation processes. We treat a situation where our actions change the structure of the environment, of which a simple example is formulated as the lob-pass problem by Abe and Takeuchi. Usually, the environment is speci ed by a nite number of unknown parameters in the bandit problem, so that the information collection part is to estimate their true values. The present paper treats a more realistic situation of nonparametric estimation of the environment structure which includes an in nite number (a functional degrees) of unknown parameters. The asymptotically optimal strategy is given under such a circumstance, proving that the cumulative loss can be made of the order O(t ) where is an arbitrarily small constant ( > 0) and t is the number of trials, in contrast with the optimal order O(log t) in the parametric case. Index Terms|bandit problem, stochastic game, optimal strategy, nonparametric estimation, stochastic approximation. K. Hiraoka is with the Department of Information Engineering, University of Tokyo, Tokyo 113, Japan. S. Amari is with the Department of Information Engineering, University of Tokyo, Tokyo 113, Japan. He is also with Riken Frontier Research Program on Brain Information Processing, The RIKEN, Wako, Japan.
منابع مشابه
Stochastic Game under Unknown Environment | a Strategy for Nonparametric Lob-Pass Problem
We treat an on-line learning model named lob-pass problem, that is an extension of the bandit problem. The nonparametric case is considered, and a class of strategies which can obtain O(t ) cumulative regret for arbitrary > 0 is constructed. It is also shown that no strategy can achieve O(log t).
متن کاملOptimal production strategy of bimetallic deposits under technical and economic uncertainties using stochastic chance-constrained programming
In order to catch up with reality, all the macro-decisions related to long-term mining production planning must be made simultaneously and under uncertain conditions of determinant parameters. By taking advantage of the chance-constrained programming, this paper presents a stochastic model to create an optimal strategy for producing bimetallic deposit open-pit mines under certain and uncertain ...
متن کاملOptimal flexible capacity in newsboy problem under stochastic demand and lead-time
In this paper, we consider a newsvendor who is going to invest on dedicated or flexible capacity, our goal is to find the optimal investment policy to maximize total profit while the newsvendor faces uncertainty in lead time and demand simultaneously. As highlighted in literature, demand is stochastic, while lead time is constant. However, in reality lead time uncertainty decreases newsvendor's...
متن کاملA Project Scheduling Method Based on Fuzzy Theory
In this paper a new method based on fuzzy theory is developed to solve the project scheduling problem under fuzzy environment. Assuming that the duration of activities are trapezoidal fuzzy numbers (TFN), in this method we compute several project characteristics such as earliest times, latest times, and, slack times in term of TFN. In this method, we introduce a new approach which we call modif...
متن کاملIntelligent Path Planning in Unknown Environments with Vision-like Sensors
In this work we present a methodology for intelligent path planning in an uncertain environment using vision like sensors, i.e., sensors that allow the sensing of the environment non-locally. Examples would include a mobile robot exploring an unknown terrain or a micro-UAV navigating in a cluttered urban environment. We show that the problem of path planning in an uncertain environment, under c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995